Data Visualization Project 02

Load Tidyverse,plotly, and SF libraries:

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.3.3
## Warning: package 'ggplot2' was built under R version 4.3.2
## Warning: package 'tibble' was built under R version 4.3.2
## Warning: package 'tidyr' was built under R version 4.3.2
## Warning: package 'readr' was built under R version 4.3.3
## Warning: package 'purrr' was built under R version 4.3.2
## Warning: package 'dplyr' was built under R version 4.3.2
## Warning: package 'forcats' was built under R version 4.3.3
## Warning: package 'lubridate' was built under R version 4.3.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(sf)
## Warning: package 'sf' was built under R version 4.3.3
## Linking to GEOS 3.11.2, GDAL 3.8.2, PROJ 9.3.1; sf_use_s2() is TRUE
library(plotly)
## Warning: package 'plotly' was built under R version 4.3.2
## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout

Load the raw data into environment:

# Navigate to \data folder and choose files
search_babynames <- file.choose()
# read the file using the read_csv() function
babynames <- readRDS(search_babynames)
# read the Florida Lakes shapes shp file
Lake_shapes <- read_sf(file.choose())
#load htmlwidgets library
library(htmlwidgets)
#Generate a plot and assign a name
n_jorge <- babynames %>%
  filter(name=="Jorge") %>%
  group_by(year) %>%
  summarize(Total=sum(n)) %>%
  rename(Year="year")
p1<-ggplot(n_jorge,mapping=aes(x=Year,y=Total)) +
  geom_point() +
  labs(title = "Number of Babies Named Jorge",
         x = "Year",
         y = "Total Babies") +
  theme_minimal()
#Make it interactive
interactive_plot <- ggplotly(p1)
interactive_plot
#Save as self-contained HTML
saveWidget(interactive_plot, "fancy_plot.html")
#Fit a linear model and get the coefficients
model <- lm(Total~Year,data = n_jorge)
coefficients <- summary(model)$coefficients
print(coefficients)
##                Estimate  Std. Error   t value     Pr(>|t|)
## (Intercept) -69047.6867 3776.661519 -18.28273 2.922173e-34
## Year            35.7732    1.922239  18.61017 7.077525e-35
p3<-p1 +
  geom_smooth(method="loess")
p3
## `geom_smooth()` using formula = 'y ~ x'

p2<-ggplot() +
  geom_sf(data = Lake_shapes, aes(fill = PERIMETER),color="black") +
  scale_fill_viridis_c() +
  labs(title = "Florida Lakes Perimeters",
       fill = "Perimeter Length (ft)") +
  theme_minimal()
p2

#Make it interactive
#interactive_plot <- ggplotly(p2)
#interactive_plot
#Save as self-contained HTML
#saveWidget(interactive_plot, "Lake Perimeters.html")

Summary of Findings

I started trying to plot the voting data from another data frame but stopped because I wanted to create something more unique. I decided to create a trend graph of how many babies were named Jorge throughout the years. Then I fit a linear model to the data to be able to visualize the growth trend but realized that it is not smooth enough to match the end result.Then I used a ‘loess’ argument in the geom_smooth function to add more curvature, but I still have some points outside the confidence interval of the model. Overall, it shows the main idea that the number of Jorges has been steadily increasing and now is starting to decline. In practice, I have met way less George’s or Jorges in the past couple of years compared to when I was little. So it checks out! It is sad the the convention of Jorges may be empty in the next couple of years though. I was able to map out the perimeters of the lakes in florida meaning that the biggest lakes are mostly found in the south and central region of the state. This may be tied to the fact that earth is flatter in central and south florida due to filling and erosion that happen since we are a peninsula.